feat(knowledge): add embedding model selection and Cohere reranker#4349
feat(knowledge): add embedding model selection and Cohere reranker#4349waleedlatif1 merged 22 commits intostagingfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Vector search gains optional Cohere reranking ( Reviewed by Cursor Bugbot for commit 8fd0557. Configure here. |
Greptile SummaryThis PR adds multi-model embedding support ( Confidence Score: 5/5Safe to merge; all P1 findings from prior review rounds are addressed and only minor P2 style issues remain No P0 or P1 issues found in the current HEAD. The two inline comments are P2: one about a confusing cost.input field that conflates embedding and reranker costs (billing total is correct), and one about missing isNull(deletedAt) in a chunk-update KB lookup (practical impact is negligible). All substantive bugs flagged in previous review rounds have been fixed. apps/sim/app/api/knowledge/search/route.ts — cost.input conflation; apps/sim/lib/knowledge/chunks/service.ts — updateChunk deletedAt guard Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant SearchRoute as SearchRoute
participant EmbeddingAPI as EmbeddingProvider
participant VectorDB as pgvector
participant CohereAPI as CohereRerank
Client->>SearchRoute: POST query + knowledgeBaseIds + rerankerEnabled
SearchRoute->>SearchRoute: checkKnowledgeBaseAccess resolves embeddingModel per KB
SearchRoute->>SearchRoute: reject if mixed embeddingModels and hasQuery
SearchRoute->>EmbeddingAPI: generateSearchEmbedding using KB embeddingModel
EmbeddingAPI-->>SearchRoute: queryVector 1536-dim
SearchRoute->>VectorDB: vector search candidateTopK = min(100, topK x 4)
VectorDB-->>SearchRoute: candidates
alt rerankerEnabled and hasQuery and results not empty
SearchRoute->>CohereAPI: rerank query + documents up to 100
CohereAPI-->>SearchRoute: ranked index + relevance_score
SearchRoute->>SearchRoute: filter invalid indices and map to SearchResult
SearchRoute->>SearchRoute: set rerankBilled = true and compute rerankerCost
end
SearchRoute->>SearchRoute: compute embedding cost via calculateCost
SearchRoute-->>Client: results with optional rerankerScore and cost breakdown
Reviews (12): Last reviewed commit: "fix(knowledge): prefer singular Cohere k..." | Re-trigger Greptile |
|
Addressed both P1 findings from Greptile in 5d1446f: 1. MDX docs literal (line 51) — replaced 2. Reranker billing gap on 0-result responses — separated |
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
…g model Greptile P1: when AZURE_OPENAI_* was set, every OpenAI embedding model was routed to the single KB_OPENAI_MODEL_NAME deployment. A KB created with text-embedding-3-large would be embedded by whatever model that deployment serves while billing tracked 3-large pricing — and chunks ingested via Azure versus queried via real OpenAI would land in mismatched vector spaces. Now require AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_(SMALL|LARGE) per model. Falls back to KB_OPENAI_MODEL_NAME only for text-embedding-3-small (legacy). If no deployment is configured for the chosen model, route to direct OpenAI instead of silently routing to the wrong deployment. Also fix type predicate in search/route.ts to use KnowledgeBaseAccessResult so the build passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cursor bugbot found that resolveCohereKey discarded BYOK status, so the
search route always added platform rerankerCost even when the workspace
supplied its own Cohere key.
Now resolveCohereKey returns { apiKey, isBYOK } and rerank() returns
{ results, isBYOK }. The search route checks rerankIsBYOK before adding
rerankerCost or emitting the rerankerCost/rerankerSearchUnits fields,
mirroring how generateEmbeddings handles BYOK billing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
…dead var Cursor bugbot: - Token estimation was hardcoded to 'openai' for every embedding model. For gemini-embedding-001 the cost was computed against an OpenAI-tokenized count, producing wrong input.tokens.prompt and (slightly) wrong cost. Now derive the tokenizer provider from the embedding model's provider. - rerankApplied was set but never read. Removed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@cursor review |
|
@greptile |
Cursor bugbot: createChunk and updateChunk hardcoded the 'openai' tokenizer when computing the stored tokenCount. For KBs using gemini-embedding-001 the count was estimated with the wrong heuristic, leading to inaccurate stored counts (and any billing derived from them). Now derive the tokenizer from the KB's embedding model provider, matching the search route. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… routes - Use accessCheck.knowledgeBase.embeddingModel directly in chunks response - Narrow access-check predicate to KnowledgeBaseAccessResult in v1 search - Move inaccessible-KB 404 check before query embedding promise creation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
URLs end up in server access logs, proxy logs, and APM tools, so embedding the key as a query param risks accidental exposure. Google explicitly recommends the header form for the Gemini REST API. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
Restore the prior fallback so existing Azure deployments — which conventionally name the deployment after the model — continue to route through Azure when KB_OPENAI_MODEL_NAME is unset. Before this fix, those deployments silently fell through to direct OpenAI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
…API_KEY fallback - Gemini's batchEmbedContents API rejects requests with more than 100 items. The token-based batcher could pack hundreds of short chunks into a single request, causing 400s. Add maxItemsPerRequest on ResolvedProvider and split token batches further when set. - Mirror resolveOpenAIKey by accepting GEMINI_API_KEY (singular) as a fallback before requiring the rotating GEMINI_API_KEY_1/2/3 keys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
Match resolveOpenAIKey/resolveGeminiKey order: check the singular COHERE_API_KEY before falling back to rotating keys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8fd0557. Configure here.
…4349) * feat(knowledge): add embedding model selection and Cohere reranker * fix(knowledge): split reranker model constants into client-safe module * fix(knowledge): bill rerank on every successful API call and fix MDX docs literal * test(knowledge): align embedding tests with provider abstraction changes Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): require explicit Azure deployment per OpenAI embedding model Greptile P1: when AZURE_OPENAI_* was set, every OpenAI embedding model was routed to the single KB_OPENAI_MODEL_NAME deployment. A KB created with text-embedding-3-large would be embedded by whatever model that deployment serves while billing tracked 3-large pricing — and chunks ingested via Azure versus queried via real OpenAI would land in mismatched vector spaces. Now require AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_(SMALL|LARGE) per model. Falls back to KB_OPENAI_MODEL_NAME only for text-embedding-3-small (legacy). If no deployment is configured for the chosen model, route to direct OpenAI instead of silently routing to the wrong deployment. Also fix type predicate in search/route.ts to use KnowledgeBaseAccessResult so the build passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): skip platform reranker billing for BYOK Cohere keys Cursor bugbot found that resolveCohereKey discarded BYOK status, so the search route always added platform rerankerCost even when the workspace supplied its own Cohere key. Now resolveCohereKey returns { apiKey, isBYOK } and rerank() returns { results, isBYOK }. The search route checks rerankIsBYOK before adding rerankerCost or emitting the rerankerCost/rerankerSearchUnits fields, mirroring how generateEmbeddings handles BYOK billing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): match search tokenizer to embedding provider; remove dead var Cursor bugbot: - Token estimation was hardcoded to 'openai' for every embedding model. For gemini-embedding-001 the cost was computed against an OpenAI-tokenized count, producing wrong input.tokens.prompt and (slightly) wrong cost. Now derive the tokenizer provider from the embedding model's provider. - rerankApplied was set but never read. Removed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): match chunk tokenizer to KB embedding provider Cursor bugbot: createChunk and updateChunk hardcoded the 'openai' tokenizer when computing the stored tokenCount. For KBs using gemini-embedding-001 the count was estimated with the wrong heuristic, leading to inaccurate stored counts (and any billing derived from them). Now derive the tokenizer from the KB's embedding model provider, matching the search route. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(knowledge): centralize tokenizer mapping on EmbeddingModelInfo Add tokenizerProvider directly to EmbeddingModelInfo so callers read it from the registry instead of reimplementing the gemini→google / openai→openai map at each call site. Removes the local helper in chunks/service.ts and the inline ternary in search/route.ts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(knowledge): lock embedding model to KB_EMBEDDING_MODEL env var Remove the user-facing model picker from the KB create modal and the embeddingModel field from the create/update API schemas. The active model is now selected server-side via KB_EMBEDDING_MODEL, which collapses Azure routing to a single deployment (KB_OPENAI_MODEL_NAME) and drops the per-model AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_* env vars and SUPPORTED_EMBEDDING_MODEL_IDS / UI-only label+description registry fields. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use provider tokenizer for chunks and bound rerank indices - documents/service.ts: replace ceil(len/4) heuristic with estimateTokenCount using the embedding model's tokenizerProvider so token counts match billing - reranker.ts: filter Cohere rerank results to valid indices before mapping to defend against malformed responses - utils.test.ts: add embeddingModel to kb fixture so getEmbeddingModelInfo resolves Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use .count from estimateTokenCount return value estimateTokenCount returns a TokenEstimate object, not a number — access .count so the integer token count is stored instead of an object. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): only enforce single embedding model when query is present Tag-only searches don't generate a query embedding, so two KBs with different embedding models can be filtered together. Gate the guard on hasQuery so cross-model tag-only queries no longer 400. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use getConfiguredEmbeddingModel in copilot KB creation Copilot-created KBs were hardcoded to text-embedding-3-small, ignoring KB_EMBEDDING_MODEL. This caused cross-KB searches mixing copilot- and API-created KBs to hit the embedding-model-mismatch guard. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): make EMBEDDING_DIMENSIONS a literal type CreateKnowledgeBaseData.embeddingDimension is typed as the literal 1536, so EMBEDDING_DIMENSIONS needs `as const` to satisfy it after the copilot path switched to passing the constant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use per-KB embedding model in v1 search route The v1 search endpoint was passing undefined to generateSearchEmbedding, which silently fell back to text-embedding-3-small. KBs created while KB_EMBEDDING_MODEL=gemini-embedding-001 (or any non-default) would have their queries embedded with the wrong model. Now resolves the model from the KB rows like the internal route, with the same multi-model guard. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(knowledge): polish embedding/reranker implementation - Drop unused supportsCustomDimensions from EmbeddingModelInfo (every registered model supports it; OpenAI/Azure paths now always send dimensions: 1536). - Type SUPPORTED_EMBEDDING_MODELS as Partial<Record<...>> so index lookups surface as possibly-undefined in the type system instead of relying on runtime null checks alone. - Require AZURE_OPENAI_API_VERSION in the Azure routing gate. Missing api-version no longer slips through as ?api-version=undefined; it now falls back to direct OpenAI. - Use the embedding provider's tokenizer (estimateTokenCount) for the Gemini fallback token estimate instead of len/4, so billing matches the model's tokenization. - Drop unreachable 'text-embedding-3-small' fallback in the manual chunk upload route — accessCheck.knowledgeBase is non-null after the access guard. - docs-chunker now reads getConfiguredEmbeddingModel() so Sim's docs ingestion respects KB_EMBEDDING_MODEL like the user-facing paths. - Add v1 search route test covering per-KB model resolution and the cross-KB mixed-model rejection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): resolve type errors and unhandled rejection in search routes - Use accessCheck.knowledgeBase.embeddingModel directly in chunks response - Narrow access-check predicate to KnowledgeBaseAccessResult in v1 search - Move inaccessible-KB 404 check before query embedding promise creation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): pass Gemini API key via x-goog-api-key header URLs end up in server access logs, proxy logs, and APM tools, so embedding the key as a query param risks accidental exposure. Google explicitly recommends the header form for the Gemini REST API. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): default Azure deployment name to embedding model name Restore the prior fallback so existing Azure deployments — which conventionally name the deployment after the model — continue to route through Azure when KB_OPENAI_MODEL_NAME is unset. Before this fix, those deployments silently fell through to direct OpenAI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): cap Gemini batches at 100 items, add singular GEMINI_API_KEY fallback - Gemini's batchEmbedContents API rejects requests with more than 100 items. The token-based batcher could pack hundreds of short chunks into a single request, causing 400s. Add maxItemsPerRequest on ResolvedProvider and split token batches further when set. - Mirror resolveOpenAIKey by accepting GEMINI_API_KEY (singular) as a fallback before requiring the rotating GEMINI_API_KEY_1/2/3 keys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): prefer singular Cohere key before rotation Match resolveOpenAIKey/resolveGeminiKey order: check the singular COHERE_API_KEY before falling back to rotating keys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Summary
Type of Change
Testing
Tested manually
Checklist